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Amendments to the Claims: 
This listing of claims replaces all prior versions and listings of claims in the application: 

Listing of Claims: 

1 . (currently amended) A method for generating an identification signal, comprising 
tho fitopo of : 

accepting as input a monophonic audio signal of limited duration; 

translating said monophonic audio signal to a representation of a series of discrete tones; 

and 

producing a control signal from said representation of discrete tones, said control signal 
suitable for causing a transponder to generate a signal, 

where said generated signal is human recognizable a s a translation of said monophonic 
audio signali 

wherein translating said monophonic audio gtpnal tn the representation of the series of 
discrete tones includes segmenting the mononhonic audi o signal into a series of segments 
according to time varying features of the audio signal that include a fea ture associated with 
energy and a feature associated with spectral composition wherein e ach tone in the series of 
discrete tones is associated with a different seg ment in the gprif* nf Moments. 

2. (currently amended) A method for generating an identification signal, comprising 
fe^ st e ps of : 

accepting as input a voice signal of limited duration; 

translating said voice signal to a representation of a series of discrete tones; and 
producing a control signal from said representation of discrete tones, said control signal 
suitable for causing a transponder to generate a signal, 

where said generated signal is human recognizable) as a translation of said voice signal; 
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wherein translating said voice signal to the represen tation of the series of discrete tones 
includes segmenting the voice signal into a series o f segments according to time varying features 
of the voice si gnal that include a feature associated w ith energy and a feature associated with 
spectral composition, wherein each tone in the series of discrete tones is associated with a 
different segment in the series of segments. 

3 . (original) The method of claim 2 wherein said generated signal is melodically 
human-recognizable. 

4. (original) The method of claim 2 wherein said generated signal is rhythmically 
human-recognizable. 

5 . (original) The method of claim 2 wherein accepting as input further comprises 
receiving said voice signal over a telephone connection. 

6. . (original) The method of claim 5 wherein said telephone connection is wireless. 

7. (original) The method of claim 2 wherein said step of accepting as input further 
comprises receiving said voice signal over a microphone attached to a computer. 

8. (original) The method of claim 2 wherein said translating step further comprises 
translating said voice signal to a range of tones within the capability of a mobile telephone audio 
output synthesizer. 

9. (original) The method of claim 2 further comprising the step of transmitting said 
control signal to a tone-producing output device responsive to said control signal. 
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10. (original) The method of claim 2 wherein said translating step further comprises 
the steps oft 

generating a digital representation of said voice signal; 
dividing said digitized signal into a plurality of frames; 
extracting analysis data from each said frame; and 
formatting said analysis data into a frame representation. 

1 1 . (currently amended) 5be A method of claim 10 for generating an identification 
signal, comprising: 

accepting as input a voice signal of limited duration: 

translating said voice signal to a representation of a series of discrete tones, including 
generatin g a digital representation of said voice signal, dividing said digitized signal into a 
plurality o f frames 1 extracting analysis data from each said frame, and formatting said analysis 
data into a fruma re presentation: and 

producing a control signal from said representation of discrete tones, said con trol signal 
suitable for causing a transponder to generate a signal, where said generated signal is human- 
rernfmi?able as a translation of said voice signal: 

wherein said frame representation further comprises a plurality of signal parameters 
including a time-domain energy measure, a fundamental frequency value, cepstral coefficients, 
and a cepstral-domain energy measure. 

12. (original) The method of claim 1 1 further comprising the step of determining said 
time-domain energy measure by multiplying the signal in a selected frame with a mean removed 
by a window function, summing the square of the result, and normalizing the summed square by 
the number of samples in said selected frame. 

13. (original) The method of claim 12 wherein said window function is a unimodal 
window function. 
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14. (original) The method of claim 1 1 further comprising the step of determining a 
fundamental frequency of a selected frame by determining the lowest significant periodic 
component of the signal of said selected frame. 

1 5 . (original) The method of claim 1 1 further comprising the step of determining 
*&* s -' cepstral coefficients of a selected frame by computing the inverse discrete Fourier transform of 

the complex natural logarithm of the short-time discrete Fourier transform of the signal of a 
selected frame, said signal windowed by a window function. 

1 6. (original) The method of claim 1 1 further comprising the step of detennining said 
coptotrol cepstral- domain energy measure by determining a short-time cepstral gain with the 
mean value removed, said short-time cepstral gain normalized by the maximum gain over all 
frames. 

1 7. (original) The method of claim 1 1 further comprising the step of detennining 
short-term averages of said plurality of signal parameters. 

18. (original) The method of claim 17 further comprising the step of deteraiining each 
said short-term average over three consecutive frames. 

1 9. (original) The method of claim 1 7 further comprising the step of determining 
creating ordinal vectors encoding the number of frames in which directionality of change as 
c determined by said short-term averages, remains the same. 

20. (original) The method of claim 19 wherein said ordinal vectors further comprise a 
count of consecutive upward short-term average change in cepstral-domain energy, a count of 
consecutive downward short-term average change in cepstral-domain energy, a count of 
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consecutive upward short-term average change in fundamental frequency, and a count of 
consecutive downward short-term average change in fundamental frequency. 

21. (original) The method of claim 20 further comprising the step of determining each 
count for each frame in said signal. 

22. (original) The method of claim 10 further comprising the step of segmenting said 
signal by counting instances of increased signal amplitude in said frames, and 

for each instance of increased amplitude, determining a change in each of pitch, energy, 
and spectral composition in a region around said instance of increased amplitude, 

whereby a segment is defined by a start frame having an instance of increased amplitude 
and an end frame is defined by changes in pitch, energy and spectral composition in relation to 
selected thresholds. 

23 . (original) The method of Claim 1 0 wherein said translating step further comprises 
grouping said frames into a plurality of regions. 

24. (original) The method of claim 23 wherein each said region is determined from a 
count of consecutive upward short-term average change in cepstral-domain energy followed by a 
count of consecutive downward short-term average change in cepstral-domain energy. 

25. (original) The method of claim 23 further comprising the step of determining the 
existence of a candidate note start frame in each said region. 

26. (original) The method of claim 24 further comprising the step of determining a 
candidate note start frame in each said region as the last frame within said region in which the 
count of consecutive upward short-term average change in cepstral-domain energy is not zero. 
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27. (original) The method of claim 25 further comprising the step of determining 
which regions of said plurality have a valid note start frame. 

28. (original) The method of claim 25 9 wherein determining a candidate note start 
frame further comprises the step of determining if the cepstral domain energy of a particular 
frame is greater than a cepstral domain energy threshold and a frame immediately before said 
particular frame was below said cepstral domain energy threshold. 

29. (original) The method of claim 25, wherein determining a candidate note start 
frame further comprises the step of determining whether a fundamental frequency range of a 
particular frame is above a fundamental frequency range threshold and whether an energy range 
for said particular frame is above an energy range threshold. 

30. (original) The method of claim 25, further comprising the step of determining a 
stop frame corresponding to each start frame. 

3 1 . (original) The method of claim 26, further comprising the step of determining a 
stop frame by locating the first frame after a start frame in which cepstral energy is below said 
cepstral domain energy threshold. 

32. (original) The method of claim 3 1 , further comprising the step of defining the stop 
frame as a frame between two and ten frames before a subsequent start frame if no frame having 
cepstral energy below said cepstral domain energy threshold is found. 
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33. (original) The method of claim 30 further comprising the step of verifying each 
start and stop frame pair by determining whether 

a) average voicing probability is above a voicing probability threshold, 

b) average short-time energy is above an average short-time energy threshold, and 

c) average fundamental frequency is above an average fundamental frequency threshold 

34. (currently amended) Ths A method of claim 30 furthor oomprioing th e steps ofe 
for generating an identification signal, comprising: 

accepting as input a voice signal of limited duration; 

translating said voice signal to a representation of a series of discrete tones, including 
generating a digital representation of said voice signal, dividing said digitized signal into a 
plurality of frames, extracting analysis data from each said frame, formatting said analysis data 
into a frame representation, and grouping said frames into a plurality of regions: 

producing a control signal from said representation of discrete tones, said control signal 
suitable for causing a transponder to generate a signal where said generate d signal is human- 
recognizable as a tra nglatinTi ft f said voice signal; 

determining the existence of a candidate note start frame in each said region; 

determining a stop frame corresponding to each start frame: 

forming an initial set of fundamental frequencies from said start and corresponding stop 



removing from said initial set those fundamental frequencies having corresponding time- 
domain energies less than an energy threshold to form a modified set of fundamental 
frequencies; 

removing from said modified set those fundamental frequencies having corresponding 
voicing probabilities less than a voicing probability threshold to form a twice modified set of 
fundamental frequencies; 

determining a median for each member of said twice modified set; 



PAGE 14/26 * RCVD AT 7/5/2005 4:00:57 PM [Eastern Daylight Tune] 1 SVR:USPT0-EFXRMf1 * DNIS:8729306 * CSID:61 75428906 * DURATION (mm-ss):1340 



; JUL 5.2005 4:13PM (3) FISH & RICHARDSON 6 175428906 NO. 1475 P, 15 

Applicant : John D. Puterbaugh Attorney's Docket No.: 16759-003001 

Serial No.: 10/037,097 

Filed : December 31, 2001 

Page : 13 of 24 



determining a mode for each member of said twice modified set; 

determining a distributional type for each member of said twice modified set with an 
associated class confidence estimate; and 

assigning a MIDI note number to each member of said twice modified set in response to 
said mode, said median, said distributional type and said class confidence estimate, whereby a 
note sequence is created. 

35 . (original) The method of claim 34, further comprising the steps of: 
creating a plurality of scales, one for each chromatic pitch class in said note sequence; 
assigning a probability to each pitch class, said probability weighted according to scale 

degree of each note; 

comparing each said plurality of scales to said note sequence to find a best fit scale based 
on occurrences of Tonic, Mediant, and Dominant of a particular scale in comparison to the note 

sequence; and 

selecting the scale with the highest degree of matching. 

36. (original) The method of claim 35 wherein said step of assigning probability 
further comprises: 

assigning negative probability weights to the first, sixth, eighth, and tenth scale degrees 
and positive probability weights to the zexoth, second, fourth, fifth, seventh, and ninth scale 
degree. 

37. (original) The method of claim 36 wherein assigning positive probability further 
comprises the step of assigning additional positive probability weight to the zeroth, fourth, and 
seventh scale degree. 

38. (original) The method of claim 35 wherein said comparing step further comprises: 
ranking said plurality of scales in order of probability; and 
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comparing said each plurality of scales with said note sequence in order of probability, 

39. (original) The method of claim 35, further comprising the steps of: 
examining a first pitch pair having a first note having a non-conforming pitch and a 
second note preceding the first; 

if said pitch pair does not conform to voice leading rules, then adjusting said first note 




unless said adjustment causes dissonance in an adjacent pitch pair. 



40. (currently amended) Apparatus for generating an identification signal comprising: 
a voice signal receiver; 

a translator having as its input a voice signal received by said voice signal receiver and 
having as its output a representation of discrete tones where an audio presentation of said 
discrete tones would be human-recognizable as a translation of said voice signal; 

wherein the translator includes an estimation module with outputs of a time varying 
feature associated with each of energy and spectral composition from the voice signal and a 
segmentation module responsive to the time varying features w ith an output of a segmentation of 
the voice signal into a series of segments according to the time varying features, such that each in 
the series of output discrete tones is associated with a different segment in the series of segments. 

41 . (original) The apparatus of claim 40 wherein said voice signal receiver comprises 
an analog telephone receiver. 

42. (original) The apparatus of claim 40 wherein said voice signal receiver further 
comprises a voice-to-digital signal transducer. 

43 . (original) The apparatus of claim 40 wherein said voice signal receiver further 
comprises a recording device. . 
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44. (original) The apparatus of claim 40 wherein said translator farther comprises a 
feature estimation module to determine values for at least one time-varying feature of said input 
signal. 

45. (currently amended) The apparatus of claim 44 wherein said translator further 
comprises a oogmentation modulo rosponoivo to output of paid fe atur e estimation module and 
o norgy of ooid input to oogm e nt Gaid input oignal into notoo and a pitch assignment module 
responsive to signal energy in each segment output by said segmentation module. 

46. (original) The apparatus of claim 44 wherein said feature estimation module 
further comprises a primary feature module, a secondary feature module and a tertiary feature 
module. 

47. (original) The apparatus of claim 46 wherein said primary feature module 
determines a plurality of values for each of time-domain energy, fundamental frequency, 
cepstral-domain energy, and voicing probability, 

48 > (currently amended) Tka An apparatus of claim 46 for generating an identification 
signal comprising: 

a voice signal receiver; and 

a translator having as its input a voice signal received bv said voice signal rec eiver and 
having as its output a representation of discrete tones where an audio present ation of said 
discrete tones would be human-r^ nfmizahl e a translation of said voice signal: 

wherein said translator further comprises a feature estimation m odule to determine values 
for at least one time-varving feature of said input signal: 

wherein said feature estimation module further comprises a p rimary feature module^ 
secondary f eature module and a tertiary feature module; and 



PAffi 17/26 * RCVD AT 7/5W005 4:00:57 PM [Eastern Daylight Time] « SVR:USPT0€FXRF-1/1 * DN1S:8729306 * CSID:6175428906 * DURATION (mir«s):1340 



JUL 5. 2005 4:15PM 



(3) FISH & RICHARDSON 6175428906 



NO. 1475 P. 18 



Applicant : JohnD.Puterbaugh 

SerialNo.: 10/037,097 

Filed : Decenjber31, 2001 

Page : 16 of 24 



Attorney 1 



Docket No.: 16759-003001 



wherein said secondary feature module determines a plurality of values for each of the 



secondary features of short-term average change in energy, short-term average change in 
fundamental frequency, short-term average change in cepstral coefficient, and short-term average 
change in cepstral-domain energy. 

49. (original) The apparatus of claim 48 wherein each said secondary value is 



50. (original) The apparatus of claim 48 wherein said tertiary feature module 
determines a plurality of values for at least one of said secondary features. 



52. (original) The apparatus of claim 51 wherein said first-phase segmentation 
module groups a plurality of successive frames of said input signal into at least one region in 
response to output of said feature estimation module. 

53. (original) The apparatus of claim 52 wherein said region is a plurality of frames in 
which a change in energy increases immediately followed by frames in which change in energy 
decreases, 

54. (original) The apparatus of claim 53 in which said region has a minimum rider of 

frames. 

55. (original) The apparatus of claim 52 wherein said second-phase segmentation 
module determines if said at least one region has a valid note start frame and if so, determines a 
stop frame. 




computed over three consecutive frames of said input signal. 



5 1 . (original) The apparatus of claim 45 wherein said segmentation module further 
comprises a first-phase segmentation module and a second-phase segmentation module. 
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56. (original) The apparatus of claim 55 wherein said second-phase segmentation 
module determines said valid note start frame in response to cepstral domain energy by 
determining whether a frame has a cepstral domain energy greater than a cepstral domain energy 
threshold preceded by a frame having a cepstral domain energy less than said cepstral domain 
threshold. 



57. (original) The apparatus of claim 55 wherein said second-phase segmentation 
module determines a valid note start frame if the fundamental frequency exceeds a fundamental 
energy threshold and if the non-cepstral domain energy exceeds an energy threshold. 

58. (original) The apparatus of claim 52 further comprising a segmentation post- 
processor to verify said start and stop frame in response to average voicing probability, average 
short-time energy, and average fundamental frequency of said start and stop frame. 

59. (original) The apparatus of claim 45 wherein said pitch assignment module 
assigns an integer between 32 and 83, said integer corresponding to the MIDI note number for 
pitch. 

60. (original) The apparatus of claim 45 wherein said pitch assignment module 
comprises an intranote pitch assignment subsystem and an intemote pitch assignment subsystem. 
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61 . (currently amended) The An apparatus of claim 60 for fteneratinfi aq identification 
signal comprising: 

a voice signal receiver: and 

a translator having as its input a voice signal received bv said voice sig nal receiver and 
having as its output a representation of discrete tones where an audio presentation of said 
discrete tones would be hu man-recognizable as a translation of said voice signal; 

wherein said translator further comprises a feature estimat ion module to determine values 
for at least one time-varying feature of said input signal: 

wherein said translator further comprises a segmentation modul e responsive to output of 
said feature estimation module and energy of said input \ * Raiment said input sig nal into notes 
and a pitch assignment module responsi ve to signal energy in each segment output by said 
segmentation module: 

wherein said pitch assignment module comprises an intr anote pitch assignment 
subsystem and an internote pitch assignment subsystem: and 

wherein said intranote pitch assignment subsystem determines pitch in response to time- 
domain energy, voicing probability, median, and mode of each said segment output by said 
segmentation module. 

62. (original) The apparatus of claim 61 wherein said intranote pitch assignment 
subsystem further comprises an energy thresholding stage to remove from a set of fundamental 
frequencies for a particular segment those fundamental frequencies whose corresponding time- 
domain energy are less than an energy threshold to produce a modified set of fundamental 
frequency for said particular segment. 
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63 . (original) The apparatus of claim 62 whereto said intranote pitch assignment 
system further comprises a voicing thresholding stage to remove fundamental frequencies from 
said modified set whose corresponding voicing probabilities are less than a voicing probability 
threshold to produce a twice-modified set of fundamental frequencies for said particular 
segment 

64. (original) The apparatus of claim 63 wherein said intranote pitch assignment 
system further comprises a statistical processing stage to compute a media and a mode for said 
twice modified fundamental frequency set and to classify said segment as a distributional type in 
response to said median and said mode. 

65. (original) The apparatus of claim 64 wherein said segment is classified as a 
plurality of distributional types. 

66. (original) The apparatus of claim 64 wherein said intranote pitch assignment 
system further comprises a pitch quantization stage to assign a MIDI note number to said 
particular segment in response to said median, said mode and said distributional type. 

67. (original) The apparatus of claim 66 wherein said statistical processing stage 
further determines a decision confidence estimate corresponding to the determination of said 
distributional type, and said pitch quantization stage includes said confidence estimate in the 

assignment of said MIDI note number. 

i' 

\ 

68. (original) The apparatus of claim 60 wherein said intemote pitch assignment 
subsystem corrects pitches determined by said intranote pitch assignment subsystem. 
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69. (original) The apparatus of claim 68 wherein said iaternote pitch assignment 
subsystem further comprises a key finding stage to assign a scale to a note sequence output by 
said intranote pitch assignment subsystem. 

70. (original) The apparatus of claim 68 wherein said internote pitch assignment 
subsystem further comprises a pairwise correction stage to examine a pitch and its preceding 
pitch for conformity to voice-leading rules, 

if a pair is determined to be dissonant according to said voice-leading rules, the intemote 
pitch assignment subsystem corrects the pitches of said pair if the pitch adjustment does not 
cause dissonance in an adjacent pair. 

71. (new) The method of claim 2 wherein the feature associated with energy includes 
a time-domain energy. 

72. (new) The method of claim 2 wherein the feature associated with energy includes 
a cepstral-domain energy. 

73. (new) The method of claim 2 wherein the time varying features according to 
which the voice signal is segmented include at least two features associated with energy. 

74. (new) The method of Clam 2 wherein the feature associated with spectral 
composition includes a cepstral coefficient 

75 . (new) The method of claim 2 wherein the time varying features according to 
which the voice signal is segmented further include a feature associated with periodicity. 

76. (new) The method of claim 75 wherein the feature associated with periodicity 
includes a fundamental frequency. 
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77. (new) The method of claim 75 wherein feature associated with periodicity 
includes a voicing probability. 

78. (new) A method for providing a ring tone for a device comprising: 
accepting a voice signal from the device; 

computing time varying measures of the voice signal; 

segmenting the voice signal into a series of segments using the time varying measures; 
determining a note for each of at least some in the series of segments; 
forming data representing a ring tone from the determined notes; and 
transmitting the data representing the ring tone to the device. 

79. (new) The method of claim 78 wherein the computed time varying measures of 
the voice signal include a time varying measure associated with each of energy, periodicity, and 
spectral composition of the voice signal. 

80. (new) The method of claim 79 wherein segmenting the voice signal includes 
using at least the measure of energy to identify regions of the voice signal. 

80. (new) The method of claim 80 wherein segmenting the voice signal further 
includes using the measure of periodicity to identify regions of the voice signal 

8 1 . (new) The method of claim 80 wherein segmenting the voice signal further 
includes for at least some of the identified regions, identifying a start and an end of a note in the 
region using at least the measure of periodicity of the voice signal. 

82. (new) The method of claim 80 wherein segmenting the voice signal further 
includes determining which regions include notes. 
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83 . (new) The method of claim 78 wherein determining the notes includes using at 
least the measure of periodicity. 

84. (new) The method of claim 83 wherein determining the discrete tones includes 
detennining the notes using note contextual information. 

85. (new) The method of claim 84 wherein detennining the notes using contextual 
information includes determining a key of the notes. 
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